Week 1 Assignment

Working in R

Download this zipped file and save the (unzipped) folder (learningR) where you can find it on your computer. Within the folder there is

a project file – learningR.Rproj – double click this to open an RStudio session that will look for and save files in learningR folder.
a data folder – data/ – that contains three small csv files – step_data_cvl.csv, step_data_alb.csv, step_data_va.csv (and the combined excel file). This captures, to the best of my ability, the data used in the Stepping Stones 2019 report as provided by the city. (I’ll share these background files with everyone next week.) The variables in these data sets are in the order in which they appear in the report (so if you aren’t sure what my variable name means, you can compare the order of the variables to the metrics in the document).
a scripts folder – scripts/ – that is currently empty.

Start an R/Rstudio session by double-clicking on learnignR.Rproj. Start a new .R script (New File –> R Script) that does the following:

Adds comments at the top that define the scripts purpose, author, and any other document information you feel should be included.
Loads the tidyverse package.
Reads in all three .csv files (using read_csv()), giving each a relevant name. How many variables and observations does each file have? (Provide answers in comments within the script.)
Examines the structure of each file (using str() or glimpse()). Do the variables have the same type (e.g., characters, numbers, logicals, etc.) across all three data sets?
Generates summary measures of each variable either the Charlottesville or Albemarle file. Do the variables take on the range of values you expected?
View the Charlottesville file in the data viewer (e.g., use View() or click on the name of the data frame in the Global Environment pane). Pick any variable you like and sort the data by that variable - in what year is that variable the highest? The lowest? Answer the same question (for the samer variable) for the Albemarle file.
Using the dplyr functions filter() and select(), along with the pipe command %>%, write a command that starts with the Charlottesville data frame, filters for the year 2016, and selects the variable voter_reg (percent of voting-age-population that is registered to vote). What percent of Charlotteville residents were registered in 2016? Repeat the command for the Albemarle data frame – what percent of Albemarle residents were registered in 2016?

Save the script into the scripts folder. When complete, submit this file to me via direct message on slack (give it a name like week1_mpc.R as I’ll be adding everyone’s to the same script folder in my own version of this folder!)

Artwork by @allison_horst

Week 1 Assignment

Review Stepping Stones report

Working in R